Approximate Homogeneous Graph Summarization
نویسندگان
چکیده
Graph patterns are able to represent the complex structural relations among objects in many applications in various domains. The objective of graph summarization is to obtain a concise representation of a single large graph, which is interpretable and suitable for analysis. A good summary can reveal the hidden relationships between nodes in a graph. The key issue is how to construct a high-quality and representative super-graph, GS , in which a super-node summarizes a collection of nodes based on the similarity of attribute values and neighborhood relationships associated with nodes in G, and a super-edge summarizes the edges between nodes in G that are represented by two different super-nodes in GS . We propose an entropy-based unified model for measuring the homogeneity of the super-graph. The best summary in terms of homogeneity could be too large to explore. By using the unified model, we relax three summarization criteria to obtain an approximate homogeneous summary of reasonable size. We propose both agglomerative and divisive algorithms for approximate summarization, as well as pruning techniques and heuristics for both algorithms to save computation cost. Experimental results confirm that our approaches can efficiently generate high-quality summaries.
منابع مشابه
Graph Hybrid Summarization
One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...
متن کاملMulti-layered graph-based multi-document summarization model
Multi-document summarization is a process of automatic generation of a compressed version of the given collection of documents. Recently, the graph-based models and ranking algorithms have been actively investigated by the extractive document summarization community. While most work to date focuses on homogeneous connecteness of sentences and heterogeneous connecteness of documents and sentence...
متن کاملWeighted Theta Functions and Embeddings with Applications to Max-Cut, Clustering and Summarization
We introduce a unifying generalization of the Lovász theta function, and the associated geometric embedding, for graphs with weights on both nodes and edges. We show how it can be computed exactly by semidefinite programming, and how to approximate it using SVM computations. We show how the theta function can be interpreted as a measure of diversity in graphs and use this idea, and the graph em...
متن کاملTLR at DUC 2006: approximate tree similarity and a new evaluation regime
We propose modifications to a summarization system that is based on computing the tree edit distance between dependency parse trees of reformulated questions and candidate sentences. We modify a recently introduced approximate tree edit distance metric by using mutual information between stemmed words for similarity matching of sub-trees. We also propose an approximate way of deriving anaphoric...
متن کاملTuple Graph Synopses for Relational Data Sets∗
This paper introduces the Tuple Graph (TuG) synopses, a new class of data summaries that enable accurate selectivity estimates for complex relational queries. The proposed summarization framework adopts a “semi-structured” view of the relational database, modeling a relational data set as a graph of tuples and join queries as graph traversals respectively. The key idea is to approximate the str...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JIP
دوره 20 شماره
صفحات -
تاریخ انتشار 2012